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Abstract  -  Three  measures  of  divergence  between  vectors 
in  a  convex  set  of  an  n-dimensional  real  vector  space  have 
been  defined  in  terms  of  certain  types  of  entropy  functions, 
and  their  convexity  property  studied.  Among  other  results, 
a  classification  of  the  a-order  entropies  is  obtained  by 
the  convexity  of  these  measures.  These  results  have  appli¬ 
cations  to  the  measurement  of  diversity  of  a  discrete  prob¬ 
ability  distribution  and  divergence  between  two  distributions. 
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1.  INTRODUCTION 


One  of  the  most  widely  used  index  of  diversity  of  a 
multinomial  distribution,  x  =  (x^,...,xn),  x^O,  Ex^  =  1 , 
is  the  Shannon  entropy,  Hn(x)  =  -E  x^  log  x^  (Shannon  flO]). 
The  concavity  of  Hr(x)  provides  a  decomposition  of  the  total 
diversity  in  a  mixed  distribution  (x+y)/2  as 

V2?4  -  5[Vx)  +  »„<*>’  +  5  Vx-y)  f1-1) 

The  first  component  2”^[Hn(x) +  Hn(y) ]  in  (1.1)  is  the 
average  diversity  within  the  distributions,  and  the  second 
component 

Jn(x,y)  =  C-H(x) -H(y))  -  2[-H  ]  (1.2) 

which  we  call  the  Jensen  difference  arising  out  of  the 
convex  function  -H(x)  is  non-negative,  vanishes  if  and  only 
if  x=y,  and  thus  provides  a  natural  measure  of  divergence 
between  the  distributions  x  and  y.  (See  Lewontin  [6]  and 
Rao  [9]  for  some  applications  of  Hn(x)  and  Jn(x,y)  in 
biological  studies).  It  is  interesting  to  note  that  Jn(x,y) 
considered  as  a  function  of  (x,y)  is  convex,  which  meets  the 
intuitive  requirement  that  the  average  divergence  between 
(x,y)  and  (z,w)  is  not  less  than  that  between  their  convex 
combination  X(x,y)  +  p(z,w)  where  X,  p  >.  ®  and  X  +  p=l.  The 
convexity  of  the  divergence  measure  JR(x,y)  is  an  additional 
attractive  feature  of  the  Shannon  entropy  Hn(.x)  as  a  measure 
of  diversity  of  a  distribution. 
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In  this  paper  we  consider  the  Jensen  difference  (1.2) 
arising  from  a  generalized  class  of  entropy  functions  in¬ 
cluding  the  a-order  entropies  due  to  Havrda  and  Charvat  [3], 
which  we  call  the  J-divergence  and  examine  its  convexity. 

In  particular,  we  show  that  the  J-divergence  (1.2)  based  on 
the  a-order  entropy 

Hn>a(x)=  (a-!)-1  (i-Z  x“)  ,  afl  (1.3) 

defined  on  the  convex  set 

sn'{(*l . *„>  e  J  *i-l>,  15(0,1)  U.4) 

is  convex  on  S  xs  if  and  only  if  ae  Cl, 2]  for  n>2  and 
n  n 

if  and  only  if  a  e  [1,2  ]  or  [3,11/3]  for  n=2.  The  last  result 
is  surprising  and  the  proof  is  rather  involved. 

We  define  two  other  measures  called  the  K  and  L-diver- 
gences  (equations  (2.4)  and  (2.5))  based  on  cross  entropy 
functions  (Good  C2])  and  study  their  convexity.  These  are 
similar  to  and  include  the  divergence  measure  introduced  by 
Jeffreys  [4]  for  providing  an  invariant  density  of  a  priori 
probability  and  applied  for  the  more  general  purpose  of 
statistical  inference  by  Kullback  and  Leibier  [5]  . 

As  a  by-product  of  these  results  we  obtain  some  inter¬ 
esting  inequalities  (equations  (4.3)  and  (5.7)). 

We  note  that  the  J,  K  and  L-divergences  are  semi¬ 
metrics  and  not,  in  general,  metrics  as  they  may  not  satisfy 
the  triangular  inequality.  However,  by  considering  these 
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functions  on  a  tangent  space  of  a  parametric  space  of 
probability  distributions,  one  is  led  to  a  differential 
metric  of  a  Riemannian  geometry  which  induces  a  metric  over 
the  space  of  distribution  functions.  This  was  done  earlier 
by  Rao  [7,8]  where  the  differential  metric  is  in  terms  of 
the  information  matrix  of  a  parametric  family  of  probability 
distributions.  This  metric  has  been  recently  studied  by 
Atkinson  and  Mitchell  [1].  Some  extensions  of  this  approach 
to  more  general  convex  functions  along  with  other  local 
properties  of  the  J,  K,L-divergences  will  be  presented 
elsewhere.  The  present  study  is  an  investigation  of  the 
global  properties  of  these  divergence  measures. 


2.  PRELIMINARIES  AND  NOTATION 

2  n 

Let  <i>  be  a  C  -function  on  a  domain  D  of  IR  .  The 

Hessian  of  <t>  at  xcD  along  the  direction  ue  lRn  is  defined  by 


Au  <|>(x)  =  d2  $(x:u)  =  uTM^u  » 

where  M.  is  the  n*n  matrix  whose  entries  are  9  9  <Kx); 

*  *i  xj 

i,j=l,...,n.  This  may  also  be  written  as 


Au  <J>(x) 


=  uT[9  9  <f>  Du  .. 

xi  xj 


Sometimes  it  is  convenient  to  consider  a  function  tp'  as  a 

function  on  the  cartestan  product  in  Rn  x  Rn.  in  this  case 

o 

we  assume  that  is  a  C  -function  on  DxD.  The 


it 
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Hessian,  then,  of  ip  at  (x,y)  e  D*D  along  the  direction 
(u,v)  e  ®n  x  iRn  is  given  by 


4(u,v)*(x'y)  =  uT[3x.3x  iJ)]u+2vT[9x  ay  «]u  +  vT[3y  3y  <j>]v  (2.1) 

^  ^  J  ^  t) 

with  the  obvious  meaning  of  the  expressions  involved. 

Let  D  be  a  convex  domain  of  iRn  .  A  function  4>  of  class 

C2(D)  is  said  to  be  convex  on  D  if  for  every  (x,u)  e  Dx(Rn, 

2 

Au<J>(x)>.0  •  The  smoothness  assumption  <f>  e  C  (D)  can  be.  of 
course  ,  weakened  by  only  requiring  that  <{>  be  continuous  on  D 
with  Au<J>(x)>.0,  where  the  partial  derivatives  are  taken  in 


the  distributional  sense.  Alternatively,  one  may  apply  a 
standard  regularization  process.  We  briefly  recall  this 
concept.  We  choose  a  C°°  -  nonnegative  function  K  whose 
compact  support  is  inside  the  unit  ball  of  (Rn  and  such  that 


j  K(x)dx  =  1. 

For  e > 0  we  define 

K  (x)  =  e-nK( e-1x) . 

e 

Suppose  f  is  locally  integrable  in  the  domain  D  of  (Rn  .  We 
may  assume  that  f  =  0  outside  a  compact  set  and  thus  f  eL^CR11)  . 
We  define 


f£(y)  =  (f*Ke)  (y)  =  jf (x)Ke(y-x)dx  =  |k  (x)f (y-ex)dx  . 

As  is  well  known,  fE€C°°(D).  Moreover,  if  in  addition  f  is 

continuous  on  D,  then  it  is  uniformly  continuous  on  compacts 

of  D  and,  limf  =  f  uniformly  on  compacts  of  D. 
e+0  e 
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For  a  function  4>  which  is  continuous  on  a  convex 

2 

domain  D,  but  not  necessarily  of  class  C  (D),  to  be  convex 

(in  the  generalized  sense)  in  D,  we  may  only  require  that 

its  regularization  defined  above,  be  convex  in  D,  in  the 

previously  described  restrictive  sense.  It  is  said  to  be 

concave  if  is  convex.  Thanks  to  the  above  process  of 

regularization  we  may  always  assume  that  the  functions  in 

question  are  sufficiently  smooth. 

2 

Let  <j>  be  a  C  -function  on  an  interval  I  of  IR  and  consider 
the  ^-entropy 


Y*(x)  ■  -Jj  xein 

as  a  function  defined  on  In.  The  Jensen  difference  (1.2) 
based  on  (2.2),  which  will  be  referred  to  as  the  J-divergence 
between  x  and  y,  is 


(2.2) 


(x.y) 


l  U(x,)  +  <Ky,)- 2<J>[(x.+y.)/2]},(Xfy)clnxIn.  (2.3) 

i=l 


When  the  interval  I  does  not  contain  the  origin,  we  consider 
alternative  measures  which  may  be  calxed  the  K  and  L-divergences , 


♦(Xj.) 


Cn,*(x’y)’1I1(xi_yi) 


fKyi) 


(2.4) 


and 


Ln,*(x’y)  *4r>  +  yi 


(2.5) 


The  Hessians  of  (2.3)-(2.5)  can  be  computed  using  the 
formula  (2.1).  However,  it  is  of  some  practical  interest  to 
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consider  the  divergence  measures  (2.3)-(2.5)  as  acting  on 


the  convex 

set  S„ 
n 

defined 

in  (1.4). 

In  this  case, 

(2.2)  can 

be  written 

as 

H  a(x 
n,  <}> 

:X>  -  V 

•1,4>(X)  +  H1 

.(X) 

» <P 

(2, 

.6) 

In_1  ,  x  = 

n-1 

x  = 

(xx, • • ' 

■•xn-l>  5 

1  -  l  x  e  I. 

-i  =  i  1 

(2, 

.7) 

Then  (2.3)  may  be  written  as 


(2.6) 


where  y,Y  are  defined  in  the  same  way  as  x,X.  Similar 
expressions  for  the  K  and  L-divergences  (2.4)  and  (^.5)  are 


also  available. 

Note  that 

Au  Hn,4>(x:X)  =  Au  Hn-l,<j>(x)  +  AU  Hl,<f>(X) 
and  the  Hessian  of  (2.3)  subject  to  (2.7)  is 


(2.9) 


Au ,  v  Jn ,  <J>  ( x :  X’  y:  Y)  "  Au,v  Jn-l,<t>(x'y)  +  V  V  Jl,<t>(X,Y) 


(2.10) 


with  similar  expressions  for  the  K  and  L-divergences,  where 

n  1 

u  =  . un_i)  ,  v=  (Vj, . . .  »vn_1)  £  'R 


and 


U  = 


n-1 

l 


i=l 


u . 


x 


We  denote  by 


n-1 

V  =  l  v.  e  iR  . 
i=l  1 


S 


n 


{(x. 


'V 


e  In  : 


E  x,=l);  I  =  C0,1],  n>2 
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the  closure  of  SR  defined  in  (1.4).  For  any  real  number 
a,  we  define 


|  (a-l)-1  (xa-x) ,  a  f1  1 

<X)  =  (2.11) 

v  x  iQgx  ,  a=  1 


over  x  e  IR+ H  (0f°°),  and  when  oi^O,  $a  can  be  extended  to 
x  *  0  with  the  convention  0  log  0  =  0.  Defining 


we  have 


n ,  a 


n ,  <P 


a 


(2.12) 


Hn  l(x)  =  -E  x.IogXj^  ,  x  e  ®n’  (2.13) 

Hn,a(x)  =  (a-1)~1(l-  Ex“),  xeSn>  afl.  (2.14) 

We  note  that  Hn  a,  for  a>0  can  be  extended  to  the  closure 
Sn  ,  which  is  the  a-order  entropy  introduced  by  Havrda  and 
Charvat  [3],  and  that  H  tends  to  H  ,  as  a+l,  which  is  the 
Shannon  entropy  Hn. 

The  J,K  and  L -divergences  based  on  H  are  denoted  by 

n ,  a 

Jn,<a,Kn,a  and  Ln  a  resPec'tive-I-y*  Their  explicit  expressions 
are  as  follows: 

f(ct-l)-1  E(x“ +y“  -  2[(xi+y./2]a>,  afl 

Jn,a(x,y)  =  (2.15) 

Etxj^log  xi+  y1logyi  -  (xi+y1)  log  C(xi+yjL)/2]i  a=l 
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Kn,a<*-y>  * 


f  (a-1)"1  KXj-yjXxJ-1- y?"1),  a)1! 


E (xi~yi)^ios  xi_  i°e  »  a=i 


(2.16) 


and 


Ln,a<x,y)  =  < 


(  a-1)-1  { E  x?  yE-a  +  E  xV"a  yj  -  2},  of  1 


^(xi-yi)(l°g  x±  -  log  y^ 


a=l  . 


(2.17) 


Here  (x,y)  e  Sn*Sn,  and  for  a  >  0,  ^  can  be  extended  to 

§  x S  .  We  note  that  K  . = L  and  these  expressions  are 
n  n  n ,  1  n ,  1 

the  same  as  the  divergence  measure  of  Jeffreys  [4]  and 
Kullback  and  Liebler  [5]. 


3.  THE  J-DIVERGENCE 


The  Hessian  of  in  view  of  (2.1),  is  given  by 


k(u,v)Jn,*(X,y)  a'i|1{a(xi’yi)ui  +  2b(xi,yi)u.v.+  a(yi,xi)v^} 


(3.1) 


where  x,y<r  In  with  I  being  any  interval  of  the  line.  Here, 
for  x,y  e  I , 

b(x,y)  <T[(x+y)/2D  (3.2) 

and 

a(x,y)  =  4>"(x)+b(x,y)  ;  x,y  e  I  .  (3.3) 
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This  shows  that  J  .is  convex  (concave)  on  In  x  In  if 

n ,  4> 

and  only  if  a(x,y)  ^0  (or  a(x,y)£0)  and 

d(x,y)  5  a(x,y)a(y,x)  -  Cb(x,y)]2  >  0  (3.4) 

for  every  (x,y)  e  Ixi. 

Now,  using  (3.2)-(3.4)  we  deduce  that  for  x,y e  I, 


a(x,y)  =  <j>"(x)  <T  f(x+y)/2]{ 


_  1  11 

r7 r(x+y)/2]  2  ^Tx) 


and 


d(x,y)  =  <}>"(x)  4»"  (y)  (f>"C (x+y ) /2  ] 

v  r _  1  1  1  , 

lrlTx+y)/2]  ~  '2<p"(x)~  Jf"( y)  1 


The  expression  in  the  last  curly  bracket  is  directly  related 
to  the  Jensen  difference  of  (<J>")-1.  This  with  a  closer 
examination  of  these  expressions  leads  to  the  following 
basic  result : 

Theorem  1.  J  .  is  convex  (concave)  on  In  x in  if  and  only 
- - - n,4> 

if  <}>  is  convex  (concave)  and  (4>")_^  is  concave  (convex)  on  I. 

As  an  application  of  the  theorem  we  consider  the  following 
family  of  functions 


ga(x)  =  af^  (x)  +  bx  +  c  (3.5) 

where  a,b,c  are  arbitrary  constants  and  {fa)  is  a  one  parameter 
family  of  nonnegative  functions  defined  on  an  interval  I  such 


that 
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f  "(x)  =  a(a-l)f  0(x)  ;  x  e  I  ,  a  c  (R  .  (3.6) 

We  shall  fix  a  normalization 


aa(a-l)  >_  0,  (3.7) 

from  which  it  follows  that  g  is  convex  on  I  for  any 

ct 

ote  (R.  An  immediate  consequence  of  Theorem  1  is  the  following: 


Corollary  1.  Let  the  notation  of  (3.5)-(3.7)  apply  and 


consider  H 


n,g 


and  J 


a 


n,g 


as  formed  in  (2.2)-(2.3).  Then,  for 


a 


any  ««  R,  H  is  concave  on  I  while  J  is  never  concave 
n  ,  g  n ,  g 

on  In* In  .  Moreover,  J  is  convex  on  Inx  In  if  and  only 

n.g 

_i  Bot 

if  (f  o)  is  concave  on  I. 

0L—& 


This  corollary  is  appled  to  the  following  special  case 


Vx)  *  x“  •  x  6  *♦  • 

Writing  6=a-2,  we  examine  whether  h^2(fg)  1  is  concave  on 
|R+ .  We  have 

hg"(x)  =  6(8-l)x”g-2  ,  x  e  IR+ 

and  thus,  h„  is  concave  if  and  only  if  3e[-l,0].  This  yields 

p 

the  following  result: 


Corollary  2.  Let 

gQ(x)  =  axa  +  bx  +  c ,  x  £  iR  + 

where  a,b,c  and  a  are  constants  with  aa(a-l)>0.  Then  H 

» 

is  concave  on  'R  ”  while  j  is  never  concave  on  ir”  *  (R*?  . 

+  n ,  g  +  + 

>&a 
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Moreover,  J  is  convex  on  fft”  *  R"  if  and  only  if 

n’gct  + 

ae  Cl, 2]  in  which  case  a^O  . 

Instead  of  g  in  this  corollary  we  may  take  A  as  in 

ct  ^ 

(2.11),  and  consequently: 

Corollary  3.  For  any  ct>0,  H„  A  is  concave  on  r”  and 
'  9 

J  ,  is  never  concave  on  R?  *R?  •  Moreover,  J  .  is 

n.<f>a  +  +  n’*a 

convex  on  R^  *  r”  if  and  only  if  a  e  [1,2]  . 

Using  this  corollary  and  (2.6)-(2.10)  we  see  that  Jn  ot* 
for  n>3,  is  convex  on  Sn*  SR  if  and  only  if  ae  [1,2].  Of 
course,  J2  a  is  also  convex  on  §2*  §2  for  every  a.  e  [1,2]  . 
However,  J2  a>  interestingly, is  also  convex  for  other  values 
of  a,  viz.,  in  [3,11/3].  The  proof  of  this  fact  is  postponed 
to  the  next  section.  Meanwhile,  we  shall  record  the  following 
corollary : 

Corollary  4.  For  any  a^O,  a  of  (2.12)  is  concave  on 

S  and  J  of  (2.15)  is  never  concave  on  S  *  5  .  Moreover, 

n  n ,  a  n  n 

for  n  >  3,  Jn  a  is  convex  on  §n  *  §n  if  and  only  if  a  e  [1,2], 
Also,  if  a e  [1,2]  then  J2  Q  is  convex  on  §2 x §2  . 


4.  ADDITIONAL  PROPERTIES  OF  THE  J-DIVERGENCE 

In  order  to  deal  with  Jg  Q  on  S2*  §2  we  shall  apply 
Corollary  1  to  the  following  family 

f  (x)  =  xa+  (l-x)a  ;  xdi  [0,1]. 
a 
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For  this  purpose  we  shall  establish  the  following  Idmma 
which  is  of  some  interest  on  its  own  right. 

Lemma  1.  The  function 

hg(x)  e  [f6(x)]_1  =  {xe+(l-x)6>_1  ;  x  £  I  =  [0,1], 
has  the  following  properties: 

(i)  for  B  £  (-”,-1)  and  8e[2,»)  ,  h^  has  inflection  points  on  I; 

(ii)  for  gc  (0,1),  hg  is  (strictly)  convex  on  I; 

(iii)  for  Bet-1,0],  h^  is  concave  on  I; 

(iv)  for  g  e  [1,5/3],  h^  is  concave  on  I  while  for  3  e  (5/3,2), 
hg  has  inflection  points  on  I. 

Proof .  We  have 

V  '  fe3[2<V2  'B(6-1)fsfe-23 

and  item  (ii)  follows  at  once.  To  proceed  with  the  other 
items,  we  study  the  sign  of  the  function 

=  282[xB_1-(l-x)B_1]2  -  8(8-1)  [xB"2+(.l-x)3"2][xe+(l-x)6] 

This  function  is  symmetric  about  the  point  x=l/2  and  it  is 
therefore  more  convenient  to  introduce  the  new  variable, 
y=(l-x)/x  with  y£  [0,1].  This  corresponds  to  x £  [1/2,1]  and  by 
symmetry  y  may  also  be  allowed  to  range  in  [1,»].  With  this  new 
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variable,  the  sign  of  the  above  function  is  the  same  as  that 
of 

Fg(y)  =  ${28(l-y3-1)2  -  (B-l)(l+y8)  (1+y6"2)}  . 

This  may  be  also  written  as 

PB(y)=  BUB+lKl-y8-1)2  -  (B-l)y6-2  (1+y)2}.  (4.1) 

When  0e  [-1,0]  it  follows  from  (4.1)  that  Fg(y)£0  and  there¬ 
fore  item  (iii)  follows.  As  for  item  (i),  we  see  from  (4.1) 
that 

Fg(0)  =  +  °°  ,  Fg(l)  =4B(l-$)<  0  for  Be  (-»,-l), 

F2(0)  =  4  ,  F2(l) =  -8 

and 

Fg(0)  =  6(6+1)  >  0,  Fp(l)  «  -4B(B-1)<  0,  for  B  e  (2  ,»)  . 

Consequently,  item  (i)  follows.  We  turn  now  to  item  (iv). 

Here  F^(y)=0  and  we  shall  therefore  assume  that  Be  (1,2).  A 
differentiation  of  (4.1)  gives 

Fg(y)  -  B(8-l)ye"3{2(B+l)yB-8y2-4By  +  2  -  B>. 

The  sign  of  this  derivative  is  determined  by 

Gg(y)  =  2(B+l)y8- 8y2  -  4By  + 2  -  B  • 

Now, 

Gg(0)  =  2  -  B  >  0  ,  Ge(l)  =  -4(6-1)  <0, 
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and  hence  Gg(yg) =  0  for  some  e  (0,1).  Next,  we  have 

Gg(y)  =  2&{(6+l)y6-1  -  y  -  2}  . 

However,  by  Bernoulli's  inequality 

y  +  2-  (e+l)yB-1=  y+  2-  (  8+1)  [l-(  1-y)]  6-1 

>  y  +  2  -  ( 8+l)[l-( 6-1) (1-y) ] 

=  (2-g2)y  +  6(6-1). 

The  last  expression  describes  a  straight  line  passing  through 
o 

the  points  (0,6  -6)  and  (1,2-6)  and  therefore 
y+2-(6+l)yB  1  >  0  for  y  t  (0,1) 

Consequently,  yg  is  the  only  root  of  Gg(y) = 0  in  (0,1)  and, 
moreover,  Fg(y)  has  a  single  maximum  at  y^ e  (0,1).  The  root 
y^  lies  in  the  variety. 

2(6+l)y0  =  6y2  +  46y+6-2  .  (4.2) 

g 

We  replace  y  in  (4.1)  by  the  quadratic  expression  in  (4.2). 
This,  after  some  manipulations,  results  in 

HB(y)s-4  ^  y2Fg(y)  =  (6-2)y4+8(6-l)y3+2(76-6)y2+8(6-l)y+6-2 
6 

and,  hence,  we  seek  6  for  which  Hg(yg)_>  ®  •  However,  we  can 
factor  H^(y)  in  the  form  of 

Hg(y) =  ( 6-2 ) ( 1+y )2  Cy-B(6)][y-B(6)_1] 

B(6)  5  (2-6)-1{36-2-2C26(6-l)]i}. 


where 
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Since  8  c  ( 1 ,2),  we  clearly  have  0  <  B(  8)  <  1  <  B( 8)”^ .  Hence, 
Hg(yg)>0  if  and  only  if 

yg > B(6). 

This  condition  is  equivalent  to  the  requirement  that 

FoL'B(8)]>0.  This  requirement  is  determined  by  the  region 
p 

o f  non-negativity  of  the  function  K(8)  defined  below.  This 
interesting  function  is  defined  as  follows: 

K(8)  =  2(8+1)B(6)B  -  8  B(8)2  -  48B(8)  +2  -8  ;  (1,2). 

We  have 

K(  1)=K(2)=0  ;  K '  ( 1  )=  +°°  ,  K'(2)=0. 

Moreover,  a  direct  calculation  shows  that  K(5/3)  =0  and  that 
8=5/3  is  the  cut-off  point  of  the  region  of  non-negativity. 
Thus  K( 8)  >  0  for  all  8  e  (1,5/3) ,  K(5/3)  =  0  and  K(8)  <  0  for 
all  8c  (5/3,2),  (see  Figure  1).  The  proof  of  the  lemma  is 
now  complete. 

Before  proceeding  any  further  we  shall  record  an  inter¬ 
esting  consequence  of  this  lemma,  or  rather  from  the  proof  of 
the  lemma. 

Corollary  5.  For  any  ye  [0,2/3]  the  following  inequality 
holds  for  all  t  e  (-<»,<») 

I  sinhyt  \  2  v  .. 

('  cosht  I  —  y+2  *  ' 4 


-K(g)  0./5 0 

■  • 
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Proof .  From  (4.1)  we  know  that  Fg(y)£0  for  6  e  [1,5/3] 
and,  therefore, 

(e+DU-y6"1)2  <  (B-l)y6_2(l+y)2. 

This  was  proved  for  ye  [0,1].  However,  this  inequality  is 
invariant  under  the  substitution  y  ■*  y~^  and,  therefore,  it  is  valid 
for  all  ye  (0,®).  Setting  y“  =  et  and  8  =  y  +1  concludes  the  proof. 

Corresponding  to  Theorem  1  and  Corollaries  1  and  2, 

Lemma  1  leads  to  : 

Theorem  2.  Let 

ga(x)  =  afa(x)  +  bx  +  c  ,  xe  1=  [0,1]  , 
where  a,b,c  and  a  are  constants  with  aa(a-l)^0  and 
fa(x)  *  xa  + (l-x)a  . 

Then  H  _  is  convex  on  In.  Moreover,  J  is  never  concave 

n  *  na  * 

on  Inxr.  It  is  convex  there  if  and  only  ifae[l,2]  or 

a  c  [3,11/3],  in  which  case  a^O. 

Theorem  2  enables  us  to  strengthen  the  result  of 

Corollary  4  on  the  Jensen  difference  of  the  a-order  entropy 

with  the  following  additional  feature: 

Corollary  6.  Jg  q  is  convex  on  SgXSg  if  and  only  if 
a  e  CL, 2  ]  or  ae  [3,11/3]. 

In  correspondence  with  (2.11)  we  define 

I  (a-l)"1[xa+  (l-x)a]  ,  a*l 

gQ(x)  «  /  (4.4) 

|  x  log  x  +  (l-x)log(l-x)  ,  a*l 
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for  xel  =  [0,1],  We  also  define 


Gn,<,<x>  3  Hn,g  <x> 
*  *&a 


x  e  S 


and  call  Gr  a(x),  xeSn,  the  paired  entropy  of  order  a. 
Using  (2.11)  -  (2.14),  we  clearly  have  the  following 
relationships: 


Gn,c.<x>  *  Hn,a<x)  + :  “511  .  "S, 

Gn,l<x>  =  +  Hn>itl-x)  ;  X«Sn 

We  shall  write 


(4.5) 


In,a(x'y)H  Jn,ga(x'y)  :  (x*y)  *  sn  *  Sn  (4'6> 

for  the  Jensen  difference  of  ga  of  (4.4).  From  Theorems 
1*  2  and  (2.6)-(2.10)  we  conclude: 


Theorem  3 .  Let  the  notation  of  (4.4)-(4.6)  apply  with  a > 0. 


Then : 

(i) 

G 

is 

concave  on  S  ; 

n ,  a 

(ii) 

I 

is 

never  concave  on 

S 

x  S  ; 

n ,  a 

n 

n’ 

(ili) 

*n ,  a 

is 

convex  on  S  x s 
n  n 

if 

and  only  if  a  ell, 2  ]  or 

a  e  [3,11/3]. 

In  particular, 

(iv)  G„  .  is  concave  on  §  and  I  ,  is  convex  on  §  x §  . 

n ,  l  n  n ,  1  n  n 

Item  (iv)  of  this  theorem  is  a  limiting  case  of  the 


previous  items  as  ct+1.  It  could  also  be  directly  deduced  from 
Theorem  4.  Indeed,  from  (4.4),  g^'(x)  =  [x(  1-x)  ]_1  >  0  which 


shows  that  g^  is  convex  on  (o,l).  Furthermore,  F=  (g^) 

o 

is  given  by  F(x)  =  x-x  and  thus  F"(x)  =  -2  <  0.  Therefore, 
(g^")-i  is  concave  on  [0,1]  and  Theorem  1  applies. 

It  may  be  noted  that  we  could  base  our  analysis  of 
sections  3  and  4  on  a  more  generalized  form  of  the  Jensen 
difference 

S\x,y)  =  2[a4>  (x)  +  8<Ky)  -  <Kax+6y)l  (4.7) 

with  a,B^0,  a +  8=1  ,  so  that  (4.7)  reduces  to  when 
ot*8.  However,  this  does  not  constitute  a  major  generalization 
and  the  results  obtained  for  J.  can  also  be  derived  for 
j(a,8)  after  a  minor  modification  of  the  argument. 


5 .  THE  K-DIVERGENCE 

We  briefly  discuss  the  K-divergence  Kr  ^  defined  in  (2.4) 
and  its  relationship  with  the  J-divergence  To  do  this 

we  define 

*Kx)  =  <f>(x)/x  ;  xc  (R+  .  (5.1) 

We  start  with  the  following  simple  proposition: 

Proposition  1.  Kr  ^  is  non-negative  on  r" x  ir”  if  and  only 
if  ip  is  increasing  on  R+  . 


Proof .  This  is  equivalent  to  the  specialized  statement  with 
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n=l  which  in  turn  is  straightforward. 

The  following  theorem  establishes  a  comparison  between 

K  .  and  J  .  : 

n ,  <£  n ,  0 

Theorem  4.  Assume  that  is  increasing  and  concave  on  I. 

Then,  for  any  (x,y)  e  'R+ x  ir"  , 

J_  ..(x.y)  <  K  .  (x,y) 
n,<p  —  n,<J> 

with  equality  if  and  only  if  x=y. 

Proof.  Again,  this  statement  is  equivalent  to  the  specialized 
case  of  n=l.  Accordingly,  we  consider  the  function 


F(x,y)  =  J1  >({)<x,y)-K1  f(J)(x,y)  ;  (  x,y)  e  |R+  X  (R+  . 

This  may  be  written  as 


F(xt£)  =  _J1_ 

x+y  x+y 


<Kx)  + 


x+y 


<Ky)  -  tl>[(x+ y)/2] 


The  first  inequality  follows  from  the  concavity  of  t|>  while 
the  second  inequality  is  due  to  the  fact  that  ip  is  increasing 
on  |R+  •  The  equality  statement  also  follows  and  the  proof 
is  complete. 


The  Hessian  of  K 

n 


A  » 


in  accordance  with  (2.1),  is  given 
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by 


A(u,v)Kn,<|>(jc’y)  “  ^  {a(xi,yi)Uj  +  2b(xi,y1)uivi+a(y1,x1)v^} 

(5.2) 

where  x,y  e  tR  +  and  for  x,y  e  iR+  , 


a(x,y)  =  4>"(x)  -  y  t|i"(x) 


(5.3) 


and 


b(x,y)  =  -Ci)j’(x)+i|»’(y)] 


(5.4) 


with  4>  as  given  in  (5.1).  It  follows,  therefore,  that  K  , 

n ,  <p 

is  convex  if  and  only  if  a(x,y)^>0  and 

2 

d(x,y)  =  a(x,y)a(y,x)  -  [b(x,y)l  ^0  ;  x,y  e  <R+  .  (5.5) 

From  (5.3)  we  see  that  a(x,y)_>0  whenever  <j>  is  convex  and 
ip  is  concave  on  IR+ .  We  have: 

Theorem  5.  Assume  that  4>  is  convex  and  ip  is  concave  on  IR+  . 
Then : 

(i)  \p  is  increasing  on  (R+; 

(ii)  Kn  ^(x.y)^  Jn  ^(x,y)_>  0  for  every  (x,y)  e  fft"xtR”. 

Equality  in  one  of  the  inequalities  entails  equalities 
in  both  inequalities.  This  occurs  if  and  only  if  x=y. 

If,  in  addition,  (5.5)  holds,  then: 

(iii)  K  ,  is  convex  on  (R^  x  fR*?; 

n ,  <p  +  + 

(lv)  K  i  ■  is  convex  on  S  xs  . 

n,  <f>  n  n 

Proof .  Using  (5.1)  we  have 

*'(x)  =  -  ^  Cif»(x)-d»,(x)] 
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and  thus 


i|)"(x)  =  -  -  [  ip ’  (x)-4>"(x)  ]  +  -^[^(x)-«j)'(x)3 

X 


=  -  £  [  24» '  (x)-<()"(x)  ]  . 


Therefore 


2ty'(x)  =  -xij>"(x)+<|>"(x)  >  0 

and  (i)  follows.  The  fact  that  (j)(x,y)>^0  and  its 
equality  statement  is  a  result  of  <j>  being  convex.  Also, 

Kn  —  Jn^x  ,y^  and  its  quality  statement  follows  from 

item  (i)  because  of  Porposition  1  and  Theorem  4.  This  proves 
item  (ii).  Item  (iii)  follows  from  item  (i)  and  the  preceeding 
discussion.  Item  (iv)  follows  from  (iii),  (5.2)  and  formulae 
similar  to  (2.6)  -  (2.10).  This  concludes  the  proof. 

The  following  hold: 

Theorem  6.  Let  ae  [1,2]  .  Then: 

(i)  Kn  (x,y)  >  Jn  ,  (x,y)  _>  0  for  every  (x,y)  e  ir”  x  ir”  . 

'  ,<pa 

Equality  in  one  of  the  inequalities  occurs  if  and  only 

if  x=y.  The  same  applies  to  K  (x,y)>.  J  (x,y)_>0 

i*  f  ct  1  n  f  ot  *“ 

for  every  (x,y)  e  Snx  Sn  . 

(ii)  K  .  is  convex  on  ir"  x  ir”  and  K  is  convex  on  S.xS  . 

n ,  <p  +  +  n ,  a  nn 

’  a  ’ 

Proof.  In  this  case  4)  is  convex  and  is  concave  on  iR. 

-  Ta  ra  + 

and,  therefore,  we  may  use  Theorem  5.  To  do  so,  we  have 

to  validate  (5.5),  i.e.,  we  have  to  show  that  the  discriminant 

function 

da(x,y)  = [ a  xa~2-  ( a-2 )yxa_3 ][ aya_2-( o-2)xya"3 ]-(xa~2+ya"2)2 


is  non-negative  on  lR+x  IR+  .  Here,  d^(x,y)  =  dg(x,y)  =  0| 

we  may,  therefore,  assume  that  ae  (1,2).  Since  da(x,x)=0 

and  d  (x,y)=d  (y,x)  it  is  sufficient  to  assume  that 
cc  ot 

y>x>0.  In  this  way,  we  have 

da(x,y)  =  x2ct"4  fa(t)  ;  tey/x, 

where 

fa(t)  e ta'3[at-(a-2)][a-(a-2)t]  -  (l+ta~2)2  .  (5. 

We  must  show  that  f  (t)  _>0  for  te  (1,°°).  After  some 
simplifications,  we  obtain 

r  <t)  =  (2-c0ta"4ga(t) 

with 

ga(t)  =  a( a-l)t2  -  2(a-l)2t  -  a(3-a)  +  2ta-1  . 

Therefore , 

g^(t) =  2(a-l)[a(t-I)+l+ta“2] > 0  ;  t e  (1,-),  a «  (1,2)  t 

Hence  gQ  is  increasing  on  (l,00)  and  since  ga(l)  =  0,  we  con¬ 
clude  that  ga(t)>0.  Therefore,  f^(t)  >  0  or  that  fa  is 
increasing  on  (0,«>).  However,  fo(l)  =  0  and  thus  f0(t)  >  0 
for  t  e  (1,°°).  This  concludes  the  proof. 

From  the  proof  of  this  theorem  we  also  deduce  the 
following  inequality: 

Corollary  6.  Let  Be  [0,1/23  Then  ,  for  every  s  <r  (-“, ®), 
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cosh2 8s  <[g2+(i-82)][l+  2|LLzAL_  coshs]. 

B  +( 1- B) 


(5.7) 


Proof.  For  ae  [1,2  ]  we  have  shown  that  f  of  (5.6  )  satisfies 
-  »  a 

f  (t)_>0  for  every  te  Cl,00).  This  is  equivalent  to 
[at  -  (a-2) ][at-1  -  (a-2)] >  [t(2-a)/2+ t-(2-a)/2  ] 

for  every  te  [1,“).  Since  this  inequality  is  invariant 
under  the  transition  t  -»-t-*,  it  holds  for  every  t  e  (0,°°). 
Putting  t =  es  and  8=(2-a)/2  concludes  the  proof. 


6.  THE  L-DIVERGENCE 


The  Hessian  of  Lr  ^(x,y)  defined  in  (2.5),  in  view  of 


(2.1),  is 


\u,v>  Ln,*(x'y)'  +2b(x1,y1)utv1  } 


where  (x,y)  e  ir"  x  ir”  .  Here 


a(x,y)  =  i  <T(~)  +  (^  <T(f) 


and 


b(x,y)  =  -  <T(f)-  <T(J)  ;  x,yciR+ 


y  ”  x 

In  this  case,  the  discriminant 


d(x,y)  =  a(x,y)a(y  ,x)  -  [b(x,y)Y 
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is  identically  zero  on  iR+x  iR+  .  This,  together  with 
formulae  similar  to  (2.6 )-(2 . 10) ,  leads  to: 


Theorem  7 .  The  following  hold: 

(i)  L  ,(x,y)  >  0  for  every  n  >  1  and  every  (x,y)e  *R?X  IR+ 

if  and  only  if  the  function  iKt)  =  t  <j>(t  )  +  <j>(t)  is 

non-negative  for  all  t  e  IR+  ; 

(ii)  L«n  ^  is  convex  on  IR”  *  IR+  if  and  only  if  »J>(t)  =  t  <J>(t  ^)+4>(t) 
is  convex  on  R+  . 


Proof .  As  for  item  (i),  we  have 

Ln,4>(x’y)  =  J,  3T7  ;  ti=yi/xi 

i=l  i 

and  L.  ,(x  ,y)  =  x-1i(j(t)  ,  t  =  y/x.  Thus  (i)  follows.  As  for 
J-  »  <P 

item  (ii),  since  d(x,y)  =  0  for  every  (x,y)e  iR+x  (R+ we  have 

that  D  .is  convex  on  ir”  x  ir"  if  and  only  if 
n ,  <p  +  + 

2  3 

a  (x,y)  =  ^  <j>"  (~)  +  4>"(^)  >  _>  °  ;  (x  ,y)  e  IR+  x  ir+. 

x  y  y 

Putting  t =  y/x  this  condition  becomes 

t-3<t>"(t-1)+  <J>"  (t)  >  0  ;  t  e  (R+  . 


This  means  that  t|>"(t)  >  0  and  the  theorem  follows. 


Corollary  7.  For  any  a^O,  Ln  a  is  a  non-negative  convex 

function  on  x  s  . 

n  n 

Proof .  We  use  Theorem  7  and  formulae  similar  to  (2.6)- 
(2.10)  for  A,  .L  (x,y)  on  S  x  s  .  We  start  with 

\  U  f  V  /  ii  p  Ot  II  il 
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a  = 1 .  In  this  case 

^(t)=tlogt  ,  ^(t)  =  t^Ct"1)  +  ^(t)  ;  t  e  R+  , 

and  thus 

if<1  (t)  =  (t-l)log  t  >_  0  ,  <J^'(t)  =  (t-1  +  t~2)  >Q  ,  t  e  IR+  . 

On  the  other  hand,  for  a ^  1, 

♦a(t)  =  (a-l)_1(ta-t),  ipa(t)  =  t4>a(t-1)  +  (t>a(t)  ;  t  c  R+  . 
Therefore,  for  a^>0,  a  /  1 , 

<^a(t)  =  (a-l)-1  t1-a  (ta-1-i)  (ta-I)  >  0  ;  t  e  IR+ 

and 

i|T(t)  =  a  (ta_2  +  t"a_1)  >  0  ;  t  e  IR+  . 

This  concludes  the  proof. 
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